20 research outputs found

    Interactive visualisation and exploration of biological data

    Get PDF
    International audienceno abstrac

    Representing and analysing molecular and cellular function in the computer

    Get PDF
    Determining the biological function of a myriad of genes, and understanding how they interact to yield a living cell, is the major challenge of the post genome-sequencing era. The complexity of biological systems is such that this cannot be envisaged without the help of powerful computer systems capable of representing and analysing the intricate networks of physical and functional interactions between the different cellular components. In this review we try to provide the reader with an appreciation of where we stand in this regard. We discuss some of the inherent problems in describing the different facets of biological function, give an overview of how information on function is currently represented in the major biological databases, and describe different systems for organising and categorising the functions of gene products. In a second part, we present a new general data model, currently under development, which describes information on molecular function and cellular processes in a rigorous manner. The model is capable of representing a large variety of biochemical processes, including metabolic pathways, regulation of gene expression and signal transduction. It also incorporates taxonomies for categorising molecular entities, interactions and processes, and it offers means of viewing the information at different levels of resolution, and dealing with incomplete knowledge. The data model has been implemented in the database on protein function and cellular processes 'aMAZE' (http://www.ebi.ac.uk/research/pfbp/), which presently covers metabolic pathways and their regulation. Several tools for querying, displaying, and performing analyses on such pathways are briefly described in order to illustrate the practical applications enabled by the model

    Fast algorithms for computing sequence distances by exhaustive substring composition

    Get PDF
    The increasing throughput of sequencing raises growing needs for methods of sequence analysis and comparison on a genomic scale, notably, in connection with phylogenetic tree reconstruction. Such needs are hardly fulfilled by the more traditional measures of sequence similarity and distance, like string edit and gene rearrangement, due to a mixture of epistemological and computational problems. Alternative measures, based on the subword composition of sequences, have emerged in recent years and proved to be both fast and effective in a variety of tested cases. The common denominator of such measures is an underlying information theoretic notion of relative compressibility. Their viability depends critically on computational cost. The present paper describes as a paradigm the extension and efficient implementation of one of the methods in this class. The method is based on the comparison of the frequencies of all subwords in the two input sequences, where frequencies are suitably adjusted to take into account the statistical background

    Frequency distribution of TATA Box and extension sequences on human promoters

    Get PDF
    BACKGROUND: TATA box is one of the most important transcription factor binding sites. But the exact sequences of TATA box are still not very clear. RESULTS: In this study, we conduct a dedicated analysis on the frequency distribution of TATA Box and its extension sequences on human promoters. Sixteen TATA elements derived from the TATA Box motif, TATAWAWN, are classified into three distribution patterns: peak, bottom-peak, and bottom. Fourteen TATA extension sequences are predicted to be the new TATA Box elements due to their high motif factors, which indicate their statistical significance. Statistical analysis on the promoters of mice, zebrafish and drosophila melanogaster verifies seven of these elements. It is also observed that the distribution of TATA elements on the promoters of housekeeping genes are very similar with their distribution on the promoters of tissue specific genes in human. CONCLUSION: The dedicated statistical analysis on TATA box and its extension sequences yields new TATA elements. The statistical significance of these elements has been verified on random data sets by calculating their p values

    Clusters of Conserved Beta Cell Marker Genes for Assessment of Beta Cell Phenotype

    Get PDF
    The aim of this study was to establish a gene expression blueprint of pancreatic beta cells conserved from rodents to humans and to evaluate its applicability to assess shifts in the beta cell differentiated state. Genome-wide mRNA expression profiles of isolated beta cells were compared to those of a large panel of other tissue and cell types, and transcripts with beta cell-abundant and -selective expression were identified. Iteration of this analysis in mouse, rat and human tissues generated a panel of conserved beta cell biomarkers. This panel was then used to compare isolated versus laser capture microdissected beta cells, monitor adaptations of the beta cell phenotype to fasting, and retrieve possible conserved transcriptional regulators.Journal ArticleSCOPUS: ar.jinfo:eu-repo/semantics/publishe

    Identification of gene targets against dormant phase Mycobacterium tuberculosis infections

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Mycobacterium tuberculosis</it>, the causative agent of tuberculosis (TB), infects approximately 2 billion people worldwide and is the leading cause of mortality due to infectious disease. Current TB therapy involves a regimen of four antibiotics taken over a six month period. Patient compliance, cost of drugs and increasing incidence of drug resistant <it>M. tuberculosis </it>strains have added urgency to the development of novel TB therapies. Eradication of TB is affected by the ability of the bacterium to survive up to decades in a dormant state primarily in hypoxic granulomas in the lung and to cause recurrent infections.</p> <p>Methods</p> <p>The availability of <it>M. tuberculosis </it>genome-wide DNA microarrays has lead to the publication of several gene expression studies under simulated dormancy conditions. However, no single model best replicates the conditions of human pathogenicity. In order to identify novel TB drug targets, we performed a meta-analysis of multiple published datasets from gene expression DNA microarray experiments that modeled infection leading to and including the dormant state, along with data from genome-wide insertional mutagenesis that examined gene essentiality.</p> <p>Results</p> <p>Based on the analysis of these data sets following normalization, several genome wide trends were identified and used to guide the selection of targets for therapeutic development. The trends included the significant up-regulation of genes controlled by <it>devR</it>, down-regulation of protein and ATP synthesis, and the adaptation of two-carbon metabolism to the hypoxic and nutrient limited environment of the granuloma. Promising targets for drug discovery were several regulatory elements (<it>devR/devS</it>, <it>relA</it>, <it>mprAB</it>), enzymes involved in redox balance and respiration, sulfur transport and fixation, pantothenate, isoprene, and NAD biosynthesis. The advantages and liabilities of each target are discussed in the context of enzymology, bacterial pathways, target tractability, and drug development.</p> <p>Conclusion</p> <p>Based on our bioinformatics analysis and additional discussion of in-depth biological rationale, several novel anti-TB targets have been proposed as potential opportunities to improve present therapeutic treatments for this disease.</p

    A complete workflow for the analysis of full-size ChIP-seq (and similar) data sets using peak-motifs.

    No full text
    This protocol explains how to use the online integrated pipeline 'peak-motifs' (http://rsat.ulb.ac.be/rsat/) to predict motifs and binding sites in full-size peak sets obtained by chromatin immunoprecipitation-sequencing (ChIP-seq) or related technologies. The workflow combines four time- and memory-efficient motif discovery algorithms to extract significant motifs from the sequences. Discovered motifs are compared with databases of known motifs to identify potentially bound transcription factors. Sequences are scanned to predict transcription factor binding sites and analyze their enrichment and positional distribution relative to peak centers. Peaks and binding sites are exported as BED tracks that can be uploaded into the University of California Santa Cruz (UCSC) genome browser for visualization in the genomic context. This protocol is illustrated with the analysis of a set of 6,000 peaks (8 Mb in total) bound by the Drosophila transcription factor Krüppel. The complete workflow is achieved in about 25 min of computational time on the Regulatory Sequence Analysis Tools (RSAT) Web server. This protocol can be followed in about 1 h.Journal ArticleResearch Support, Non-U.S. Gov'tSCOPUS: ar.jinfo:eu-repo/semantics/publishe

    Accurate Theoretical Studies of Small Elemental Clusters

    No full text
    corecore